QL Assembly Language Mailing List 


x 3-< ae 
Nef®e 


Issue 7 


Norman Dunbar 


ie 
/ 
\ y 
a 
ny 
ie 
a 
a 
Me 
‘ 


PUBLISHED BY MEMYSELFEYE PUBLISHING ;-) 


Download from: 
https: //github.com/NormanDunbar/QLAssemblyLanguageMagazine/blob/Issue_007/Issue_ 
007/Assembly_Language_007. pdf 


Licence: 

Licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License (the 
“License”). You may not use this file except in compliance with the License. You may obtain 
a copy of the License at http: //creativecommons.org/licenses/by-nc/3.0. Unless re- 
quired by applicable law or agreed to in writing, software distributed under the License is dis- 
tributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
express or implied. See the License for the specific language governing permissions and limita- 
tions under the License 


This pdf document was created on ///0/2019 at 18:45. 
Copyright ©2019 Norman Dunbar 


2.1 


3.1 
3.2 
3.3 
3.4 
3.5 


4.1 


Character Characteristics 
PIG TQUGINT seek be batik be Cater ae Chicks ba Chie bi aciacks 


UTF8 and the QL .. 0... ce ee eee eee ee neee 
UTF8 Encoding 
The QL Character Set 


Me NIE, icin i955 Sui 4-8. 9h Fi d00 4 Red O08 cohen eeees 
The Code 


HC 2ZOUUTNY icdesca td scci een dacnaw sme dane aesew danwiaaaedaniean 
The Code 


fe LS oe ee ee ee ee ee 


of 


I= 


6.5 
6.6 
6.7 
6.8 
G9 
6.10 
6.11 
G12 
6.13 
6.14 
6.15 
6.16 
6.17 
7.1 
Ts 
Ta 
74 
V2 
7.6 
Tod 
7.8 
Wee 
TAQ 
74d 
Tz 
7.13 
7.14 
FAD 
7.16 
LAT 
7.18 
Tae 
7.20 
earl 


Ql2utf8: Top of the loop - reading bytes .... 2... .........-..000-. 35 
OlZnirs: One byte OrMole! o.c0 Sa- a bake Barge @ Ge Bo Be wee Pas 36 
Ql2utf8: Handling the UK Pound... 2... 2... 2... ee ee ee ee 36 
Ql2utf8: Handling copyright .. 2... ee 36 
Ql2utf8: handling low value ASCII codes ..................02004.4 37 
Ql2uths? Writing one byteof UTES... 66 hk ee a ee ee 37 
Ql2utf8: Handling exceptions - the Grave/backtick ...............-.4. 37 
Ql2utf8: Handling exceptions - the Euro Currency symbol .............. 38 
Ql2utf8: Handling exceptions - the arrow characters... .............-. 38 
Ql2utf8: The arrow character table .. 2... . ee ee 39 
Ql2utts: ‘Two byte characters... . 2s ee ee ee ee ee ee 39 
Ql2utf8: Clean up andexithandling ...................0.2-000.4 40 
Ql2utf8: The UTF8 “two byte” character table... ..............-2-.0. 40 
EXCCUURE UNG2ZO) oe hi oe ea ed Boe oe ee Eh ee oe ee ws 43 
Utf82QI: Introductory comments... .......... 200020 eee eee 43 
Wis2O) Job header’ 2... 6464 Dena ca ka fae dhe ee ae ea we ea 44 
Utf82QI: Testing fortwo channels .. 2... .. 2.2.2... 2.-. 0020000 eee eee 45 
Utf82QI: Initialising constant registers... 2... ee ee 45 
Utts2Ql: ‘The main loop Starts . 2. eo be ba ba eee ee ee 46 
Utf82QI: Testing for one byte UTF characters .................--40. 46 
Utf82QI1: Handling exceptions - the grave/backtick character ............. 46 
Utf82QI1: Handling one byte UTF characters... 2... ............-004. 47 
Utf82QI: Testing for two byte UTF characters... ..............--.4. 47 
Utf82QI: Testing for three byte UTF characters ..................-.4. 47 
Utf82QI: Error out on UTF8 four byte characters ...............-.-.4. 48 
Utf82Q1: Handling UTF8 two byte characters .................-0-4. 48 
Utf82QI: Handling exceptions - the UK Poundsymbol ................ 48 
Utf82QI1: Handling exceptions - the copyright symbol ................. 48 
Utf82Q1: Two byte UTF8 character handling. .................-2-.4. 49 
Utf82QI: Invalid UTF8 character detected .. 2... 0... 0.00.00... 0002 ae 49 
Utf82QI1: Three byte UTF8 character handling. .................... 49 
Wis20l: Petchins the third PYG i... 62 2 a) wack Bae ee Oe Ge ede was 50 
Utf82QI1: Handling the Euro Currency symbol ..................-4.4. 50 
Utf82QI: Handling the arrow characters ..........0 2.002 eee eee ee 50 


7.22 Utf82QI: 
7.23 Utf82Ql: 
7.24 Utf82Ql: 
7.25 Utf82QI: 
7.26 Utf82QI: 
7.27 Utf82Ql: 


Writing and reading bytes ...............-. 0-000-400-0004 51 
scanning for UIPS Words: ...0 6 2408.64 Wa we ew a de 51 
UTPS charactertound 2.2 62-26 be ta Paond aie Ra Pew we San 52 
Missiig WTS Word caw cee ke Ga ee a oe a OE ee a 52 
Clean up and exit handling... be ee a 52. 


The UTF8 “two byte” character table ..............2..02.. 53 


Feedback 


Please send all feedback to . You may also send articles 
to this address, however, please note that anything sent to this email address may be used in a 
future issue of the eMagazine. Please mark your email clearly if you do not wish this to happen. 


This eMagazine is created in 4IpXsource format, aka plain text with a few formatting commands 
thrown in for good measure, so I can cope with almost any format you might want to send me. As 
long as I can get plain text out of it, I can convert it to a suitable source format with reasonable 
ease. 


I use a Linux system to generate this eMagazine so I can read most, if not all, Word or MS Office 
documents, Quill, Plain text, email etc formats. Text87 might be a problem though! 


Subscribing to The Mailing List 


This eMagazine is available by subscribing to the mailing list. You do this by sending your 
favourite browser to and clicking on the 
link “Subscribe to our Newsletters”. 


On the next screen, you are invited to enter your email address twice, and your name. If you 
wish to receive emails from the mailing list in HTML format then tick the box that offers you that 
option. Click the Subscribe button. 


An email will be sent to you with a link that you must click on to confirm your subscription. Once 
done, that is all you need to do. The rest is up to me! 


1.3 


10 Chapter 1. Preface 


Contacting The Mailing List 


I’m rather hoping that this mailing list will not be a one-way affair, like QL Today appeared to be. 
I’m very open to suggestions, opinions, articles etc from my readers, otherwise how do I know 
what I’m doing is right or wrong? 


I suspect George will continue to keep me correct on matters where I get stuff completely wrong, 
as before, and I know George did ask if the list would be contactable, so ’ve set up an email 
address for the list, so that you can make comments etc as you wish. The email address is: 


assembly@qdosmsq.dunbar-it.co.uk 


Any emails sent there will eventually find me. Please note, anything sent to that email address will 
be considered for publication, so I would appreciate your name at the very least if you intend to 
send something. If you do not wish your email to be considered for publication, please mark it 
clearly as such, thanks. I look forward to hearing from you all, from time to time. 


If you do have an article to contribute, Il happily accept it in almost any format - email, text, 
Word, Libre/Open Office odt, Quill, PC Quill, etc etc. Ideally, a JATEXsource document is the best 
format, because I can simply include those directly, but I doubt Pll be getting many of those! But 
not to worry, if you have something, I'll hopefully manage to include it. 


oft 
seeker eile. 


No Feedback so far! 


I’m very grateful to Tobias Fréschle who submitted this article for publication. 


It concerns the various ways in which the Q68 can move memory around. It appears that the Q68 
has a lot of memory, and doing simple things like scrolling the screen around can take quite some 
time. 


I hope you enjoy the following. 


Messing Around with the Q68 


While Norman tends to write his stuff in GWASS, my favourite assembler is QMac. The choice 
is mainly a matter of taste -GWASS overall has similar features to QMac. So bear with me, the 
code examples here will be in QMac lingo. 


In the passing time between Christmas and back to work called “between years” in Germany, there 
was a bit of time to mess around with the Q68 and the trusty QMac Assembler. I was always a bit 
concerned how the Q68 can handle the massive amounts of memory that need to be shoved around 
in order to handle a high-colour screen. 


My favourite resolution on the Q68 is the high colour mode with 512 by 384 pixels. One pixel takes 
16 bits in this resolution, a 68000 word. That makes 1kBytes per scan-line, all in all 384kBytes for 
the whole screen. Scrolling this screen to the left by one pixel, for example, requires moving 384 
x (1024 - 2) bytes of memory, scrolling the whole screen to the left by 512 pixels with a one-pixel 
increment to create smooth animation requires 384kBytes * 512 times to be moved — a whooping 
192Mbytes of memory shoved around. (In a game, for example, you would, however, scroll in 
larger increments to speed up things, normally.) 


All the below experiments will work on the Q68 or on QPC2 (provided you set the screen resolu- 
tion to 512 x 384 and 16-bit colour.). The screen start address will be different, though. (You can 
find out with the SCR_BASE S*BASIC command). 


3.2 


3.3 


OAYNDMBWN KR 


14 Chapter 3. The Fastest Scrolling in the West 


To put things in perspective: This action results in roughly 12 times more memory to shove around 
than an original Black Box would need to do for the same action. Granted, on the Q68 we don’t 
need to shift the screen words themselves to scroll horizontally, which makes matters a bit simpler 
(thus faster), but it is still a huge task. I just wanted to see how the Q68 would cope with this. 


The Straight-Forward Approach 


Let’s start simple (or, should I call that naive?): Two nested loops, the innermost moves one scan- 
line one pixel to the left using two address registers, the outermost iterates over all scan-lines. Call 
that routine 512 times and we’re done: 


S SCrOUIS Wie SCreem Ome Rell [oO We Ie 


Lscroll 

movem.1! aQ—al,—(sp) 

lea screen_start ,a0 

lea 2(a0) ,al 

move.w #384-1,dl ; 384 scan—lines 
lineLoop 

move.w #512—1,d0 el pix ellis atom move 
cpy_loop 

move.w (al)+,(a0)+ 

dbf d0,cpy_loop 

dbf dl ,lineLoop 


Listing 3.1: Scrolling one pixel leftwards 


We’re at 90 seconds now to scroll a screen across the whole screen width and the scrolling looks, 
admittedly, pretty lame (remember, that is moving 192 megabytes of memory...). The first im- 
provement that comes to mind is a long-word move in cpy_loop which would allow us to save half 
of the inner loop iterations. Should be like 30-50% faster on a real 68000. On a Q68, it unfor- 
tunately isn’t for some reason. In fact, it is only a few seconds faster and not really a significant 
improvement. Time to look for some more drastic means to speed things up: 


Unrolling loops (or: How to waste Precious Amounts of Memory) 


What slows the straightforward approach down quite a bit are the two nested loops (one per width 
of screen, one per height of the screen). If we could get rid of these, or at least one of them, we 
should achieve a significant improvement. And, in fact, we can. The Q68 has so much memory 
that we can put that to good use: Instead of looping around one single longword move, we can 
write all the 256 iterations in a row into our source code, voila, the inner loop is gone. Because 
programmers are lazy and writing 256 identical statements is a bit boring, it is now time to show 
the interested (?) reader what the “Mac” in QMac is good for: Time for some macro trickery. 


REPT MACRO num, args 

LOCAL count, pIndex , pCount 
Count SEITNUM 1 
Lp MACLAB 
pCount SEITNUM [.NPARAMS] — 1 
pIndex SETNUM 2 
pLoop: MACLAB 

EXPAND 


— 


1 


2 


AB WN rR 


COMMON DN 


3.3 Unrolling loops (or: How to waste Precious Amounts of Memory) 15 


[ .PARAM([ pIndex ]) ] 

NOEXPAND 
pCount SEINUM [pCount]—1 
pIndex SETNUM [pIndex ]+1 

IFNUM [pCount] >= 0 GOTO pLoop 
Count SETNUM [count] + 1 
[count] <= [num] GOTO Ip 
ENDM 


Listing 3.2: The REPT Macro 


If this is all Chinese for you, the whole macro simply repeats the text you give it as second to last 
argument(s) the amount of times you give as the first, like 


NotUseful 
REPT 256,{ nop },{ clr.1 dO} 


Listing 3.3: A simple REPT example 


Will expand to 256 NOP and CLR.L D0 instructions in your code. The GOTO directives don’t 
do anything in your finished program, but rather have the assembler running in circles producing 
source code for you (nice, isn’t it?). The outer loop starting at Lp iterates over the parameter list 
the amount of times you give as first parameter, the inner loop at pLoop over the parameter list. 
Ideal stuff for lazy programmers. 


The macro would look a bit different when written in GWASS which uses a similar, but 
slightly different macro syntax (That I don’t happen to be familiar with, unfortunately (and 


I should really work on my writing style — That looks like a programmer’s))). 


Now back to our screen scrolling problem: We wanted to unroll the inner loop which iterates over 
the pixels in one single scan-line to get rid of the inner loop. So, let’s place that macro invocation 
(incantation?) in place of that inner loop, replacing it with 256 long word move instructions: 


5 SCrOMS WS SCrMEGh GMS Pell i) Wie Ilewi 


Lscroll 

movem.! aQ—al,—(sp) 

lea screen_start ,a0 

lea 2(a0) ,al 

move.w #384-1,dl ; 384 scan—lines 
lineLoop 

REPT 256,{ move.! (al)+,(a0)+} 

dbf dl, lineLoop 


Listing 3.4: Unrolling the innner loop 


The REPT invocation looks unremarkable, but if you have a look at the produced assembly listing, 
you will find that the assembler has just expanded the macro to 256 lines of code, effectively 
replacing that inner loop (this also blew our code for that loop from xxx to yyy bytes. But after all, 
we are on a Q68 or QPC and have plenty of memory to trade for). 


If you run the above code, you will find it runs about three times faster than the previous version, 
so we have bought execution speed for memory. Want to drive this a bit further by unrolling the 
outer loop as well? Try something like 


16 Chapter 3. The Fastest Scrolling in the West 


1} screenLongs EQU 512*«384%2/4 
2 REPT [screenLongs ],{ move.1 (al)+,(a0)+} 


Listing 3.5: Unrolling the outer loop 


But that might be a little ridiculous, so I have left this as exercise to the reader (Ha! I always 
wanted to use this sentence somewhere). 


Can we still do better? Sure. 


3.4 MOVEM.L Can Work in Other Places Other Than the Stack 


There is one instruction in the 68k instruction set that can shove memory about in large chunks — 
The MOVEM instruction. You would normally use it to save and restore registers to and from the 
stack in subroutines, but its use is not restricted to that. In cases where you have many registers to 
spare, you can also use it to implement large block moves. 


There’s just one single caveat: The MOVEM instruction does not work with a “post-increment” 
we would need to do a block move, so a simple 


I movem.1! (a0)+,REGSET 
2 movem. 1 REGSET,(al )+ Se this inst huction does not exist 


Listing 3.6: MOVEM restrictions 


will unfortunately not work, so, in order to repair this, we need to increment the target register 
with a separate instruction. 


So, let’s assume you can spare (or free up) registers d3-d7 and address registers a2-a6 in our 


scrolling routine, we can move a whoopy 40 bytes per instruction like in (note the backslash in a 
macro invocation is understood as a line continuation character in QMac) 


1 REPT DS) aif movem.! (al)+,d3—d7/a2—a6 } ,\ 
2 { movem.!1 d3—d7 ,a2—a6 ,( a0) },\ 
{ adda.1 \#40,a0} 


Listing 3.7: Improving the REPT macro 


This time our macro receives 4 arguments, the repetition count and the three lines to repeat. The 
macro magic will repeat these three lines 25 times in an unrolled loop, creating copy commands 
for 250 longwords. Oops, 6 missing to a complete scan-line, so add a 


if REPT 6,{ move.! (al)+,(a0)+} 
Listing 3.8: Scrolling one pixel leftwards 


after it to create code to move the last 6 long words of a scan-line. 


This is only marginally faster as the above unrolled loop on a Q68, but saves a significant amount 
of code space with an even (slightly) better runtime speed. I was actually expecting a bit more 
speedup, but Q68 instruction timings seem to differ from the original 68k. 


The MOVEM block move is the fastest way to move large chunks of memory around using a 
68000 CPU (In case you happen to know anything faster, I'd like to hear from you), so, we’re at 
the end here. Really? No, not quite: 


3.5 


3.5 If Software Can‘t Cope, Use Hardware 17 


If Software Can’t Cope, Use Hardware 


If you want to speed up the scrolling even further, you can use the SD memory in the Q68. This 
is a small (read: scarce, about 12k) amount of very fast memory that can be used for time-critical 
routines. 


Code like the above (that mainly accesses “slow” memory) can be expected to run about three 
to four times faster in Q68 SD RAM than in the normal DRAM areas. As the amount of space 
available in fast memory is limited (some of it is already used by SMSQ/E as well), you might 
want to keep the usage of fast memory as low as possible. Also note that, just like the RESPR area, 
it is not possible to release space in fast memory once it has been allocated. A game, for example, 
could however easily argue that you would reset the computer anyway after finishing. 


My tests resulted in about a three-fold speed increase once the above routines were copied to fast 
memory and executed from there. 


Lookup tables are useful. Remember when you were at school and had to find the logarithm of a 
number? You didn’t have to calculate it every time it was needed, someone else did it for you and 
put the details in a booklet’. When writing code it’s sometimes useful to use lookup tables rather 
than doing a possibly resource intensive calculation each and every time. 


The rest of this section shows a couple of uses for lookup tables. 


Bits and Bobs 


Here’s a sequence of 10 numbers, they are all integers: 


ORS eile 2 elie De eer lee 2. 


Q1: Do you know what the next value in the sequence will be? 

Q2: Do you know what the above sequence represents? 

Would it help if I told you that the formula to calculate the value for number ‘n’ in the sequence is 
given by: 

Value(n) = (value(int(n/2)) + (n and 1) 


For example, to find the value of the number 10, the 11th number in the sequence as we start from 
0, and which just happens to be the answer to Q1 above, we must take value(5) and add on bit 
0 of 10. Of course, we need then to find the answer to Value(2) and add on bit 0 of 5 and so on. 
Recursion anyone? This works out as the following sequence of calculations: 


‘Ok, I’m probably showing my age here - calculators were not invented/easily available until after I was in secondary 
school! We had a booklet of log tables to look up. 


20 Chapter 4. Lookup Tables 


Value(10) = (value(5) + (10 and 1) 
Value(5) = (value(2) + (5 and 1) 
Value(2) = (value(l) + (2 and 1) 
Value(l1) = (value(0O) + (1 and 1) 


Value(0O) = 0 


This gives us, working backwards up the above sequence of calculations: 
Value(0O) = 0 


Value(l) =O+12=1 
Value(2) =1+02=1 
Value(5) =1+41 2) 
Value(10) = 2+0 = 2 


So, the 11th number in the sequence, aka value(10), is 2. That answers Q1, Q2 will be answered 
soon, I promise. 


Assuming you need to know these numbers in a program you happen to be writing in assembly lan- 
guage, you could work them out each time. The formula does tend to imply recursion is required 
and the following brief section of code will do exactly that. 


7 .OneEninye( to Value snout mic) 
; DO.B = Required value for ’n’ 
a One Exel: 
3 D1.B = Answer (Value(n)). 
5 AU TEGINTerS Bree preesernvyecl Gxeejoi IDI sine! IDX), 
2 JRiMi@ir Al Stele ii@ie A Glemiye Writ IN) = I@, Ibintier Ai 
; Value, with DO holding the required byte value, to 
. cAllemlare tne resmiltt ier tine svallne . 
Start moveq #10,d0 Neo 
bsr.s Value _ Get) fe ums ive 
; Result is now in D1.B. 
Back moveq #0,d0 Nomen noms 
rts 
Value tst.b dO 2 IN = @ yer? 
bne.s More ; Not yet 
moveq #0,d1 s Yoss Welt (0) = © 
ES 
More move.w d0,—(a7) ; Save current N 
Isr.b #1,d0 ; INT(N/2) 
bsr.s Value weRecunse 
720On return to heres DIB holds the” Value(N/ 2) res wits 
rtnHere move.w (a7)+,d0 ; Current N again 
btst #0,d0 ; Anything to add in bit 0? 


4.1 Bits and Bobs 21 


34 beq.s Done ; No, even number. 

39 addq.b #1,d1 2 Wes, aaa bit © or IN) 
36 

37 |} Done rts 


Listing 4.1: Calculating values with recursion 


So, what happens in the above when we use 10 as the required value? 


1. At the label Value, DO = 10 and the stack contains the return address of label Back, and the 
return to SuperBasic address. The stack looks like this: 


SuperBasic 
Back 


2. As DO is not yet zero, we end up at label More where we stack DO, shift it right to get 5, and 
call Value again. At Value, the stack looks like this: 


SuperBasic 
Back 

10 

rtnHere 


3. As DO is not yet zero, we end up at label More where we stack D0, shift it right to get 2, and 
call Value again. At Value, the stack looks like this: 


SuperBasic 
Back 

10 

rtnHere 

5S) 

rtnHere 


4. As DO is not yet zero, we end up at label More where we stack D0, shift it right to get 1, and 
call Value again. At Value, the stack looks like this: 


SuperBasic 
Back 

10 

rtnHere 

5) 

rtnHere 

2 

rtnHere 


5. As DO is not yet zero, we end up at label More where we stack D0, shift it right to get 0, and 
call Value again. At Value, the stack looks like this: 


SuperBasic 
Back 

10 

rtnHere 

5 

rtnHere 


22 Chapter 4. Lookup Tables 


2 
rtnHere 
1 
rtnHere 


6. At Value, DO is now zero, so we store zero in D1 and return to rtnHere. 
7. At rtnHere, we unstack 1 into DO. The stack now looks like: 


SuperBasic 
Back 

10 

rtnHere 

5 

rtnHere 

2 

rtnHere 


As DO is odd, we add 1 to D1. The running total is now 1. Then we execute an RTS 
instruction and end up back at rtnHere. 
8. At rtnHere, we unstack 2 into DO. The stack now looks like: 


SuperBasic 
Back 

10 

rtnHere 

5) 

rtnHere 


As DO is even, we don’t add | to D1. The running total is still 1. Then we execute an RTS 
instruction and end up back at rtnHere. 
9. At rtnHere, we unstack 5 into DO. The stack now looks like: 


SuperBasic 
Back 

10 

rtnHere 


As DO is odd, we add 1 to D1. The running total is now 2. Then we execute an RTS 
instruction and end up back at rtnHere. 
10. At rtnHere, we unstack 10 into DO. The stack now looks like: 


SuperBasic 
Back 


As DO is even, we don’t add | to D1. The running total is still 2. Then we execute an RTS 
instruction and end up back at Back. 

11. At Back we clear DO and return to SuperBasic. The value in D1 is 2, which is the correct 
value for the 11th number in the sequence. 


The test code above is fine if you only need one or two values, but if your code needs lots, then a 
lookup table would be a good trade off between memory usage - you need extra space for the table 


—_ — 
FBP OomarytanfRWN 


ee a 
OmAANDHDNKHW PD 


NNN WY 
WN © 


NNNW 
IANH 


Nw 
\o 


WN Re 


nn 


4.1 Bits and Bobs 23 


- and CPU resources - if you have to do lots of calculations each time. The following code sets up 
a lookup table for all values from 0 to 255 - so that’s a good reason for having a single byte for 
each value. 


; Lookup Table initialisation. 

a Resisten  Wisare: 

¢ IDOI} = IN wou (0) = 25)5)) . 

8 DY) Je} = IONIC in), \vellhuirc (UND) « 

5 A IL, SS JPOUMNIEH We) Siar Ol JOO) talolle . 


Entry bra Start ; Skip the lookup table 
Lookup ds.b 256 ; Lookup table 
Start moveq #0,d0 ; Value (0) 
lea Lookup, a2 ; Guess! 
move.b d0,( a2) ; Save value(0O) in table 
Loop addq.b #1,d0 eNotes 
bcs.s Done 7) Bale wout wait 25.6 
move.w dO, d2 .Copys a to D2 
Isr .w #1,d2 ew INGTS Gni/29) 
move.b (a2,d2.w) ,d2 ; Value(INT(n/2) ) 
btst #0,d0 ; Anything to add? 
beq.s Store ; No, just store value(n) 
addq.b #1,d2 seGesSe ad deab it 
Store move.b d2,(a2,d0.w) ; Store Value(n) 
bra.s Loop ; Not done yet 
Done moveq #0,d0 7 Nomemnrons 
rts 


Listing 4.2: Initialising the lookup table 


If the program initialises the lookup table during startup, then any time it needs to extract a value, 
it’s as simple as: 


move.w #n,d0 ; DO must be O — 255 
lea Lookup , a2 
move.b (a2,d0.w),d0 ; Value(d0.b) 


Listing 4.3: Using the lookup table to find a value 


At this point, DO.B holds the result of Value(n). Keep in mind that the lookup table only gives 
values between 0 and 255, but DO is a word in the above for ease of indexing the table. 


So, what’s it all about I hear you ask? It’s simple, the sequence I gave you way back at the 
beginning is the number of ‘1’ bits in any byte value. 


Taking 10 as an example, it is 0000 1010,i,while 5, half of 10, is 0000 0101 pin- the same number 
of bits. So, that works for even numbers, how about odd ones? Well, half of 5 is 2.5 bit as we are 
rounding down, that’s 2. Two is 0000 0010,in Doubling 2 gives 4 or 0000 0100p,inand 5 is just 4 
plus 1. So, the number of bits in an odd number is still the same as the number in half of it, plus 


4.2 


OANANDNFKWN KR 


24 Chapter 4. Lookup Tables 


bit 0. Simples?. 


Character Characteristics 


Another useful lookup table would be one which, again, covers 256 byte entries. However, instead 
of values, these bytes contain up to 8 bits of ‘flag’ information. In the C/C++ programming 
languages, there are numerous functions (and also, macros with the same name) which can be 
used to determine if a character is a digit, upper case, lower case, printable etc. This is done with 
a lookup table of bit flags. 


Each character class (numeric, alphabetic etc) has one or more bits set in the table entry to indicate 
if this character is indeed a digit, upper case etc. In C68 (look in the header file ctypes.h) we 
have a number of bit masks defined, as follows, although I am using better names than the C68 
code! 


UPPERCASE equ 1 Bait Om — AG 25 

LOWERCASE equ _ 2 + Bait a-—Z 

DIGIT equ 4 7 Bit 2 = 9 

SPACE equ 8 < Bat 3 space, tab, linefeed 
PUNCTUATION equ 16 a IBaltt th Sg ge CE 

CONTROL equ 32 5 Itt S = Codes «— 3 

BLANK equ 64 5 Wilt ( = Soakee , tel») 

HEXDIGIT equ 128 ; Bit 7 =A-—F, a-f 


Listing 4.4: Character attribute bit masks 


So, in the lookup table for the English language, every entry between CODE(’ A’) and CODE(’Z’) 
will have the UPPERCASE flag, bit 0, set. They will also have the HEXDIGIT flag, bit 7, set for 
‘A’ through ‘P’. 

Now, I don’t know about you, but I really don’t fancy typing in 256 entries in a table, with the 
possibility of getting it wrong, somewhere. That’s a nightmare scenario, so the QL can do it for 
me (you, on the other hand, can simply download the code for this issue and get it for free!) I 
wrote the following, simple, C68 code to generate the file I needed for assembly routines, using 
my own constant values as listed above. 


The following is the listing of the C68 program, characters_c: 


#include <stdio.h> 
#include <ctype.h> 


int main(int argc, char xargv[]) { 


Tit x: 

printf ("UPPERCASE equ | B Jet @) S AN = aii ye 

printf ("LOWERCASE equ 2 S Jet: I ee) ain 

printf ("DIGIT equ 4 a leit 22 = (0) = Bin )).8 

printf ("SPACE equ 8 2 ili 3 = SPACE ely Cie \m” es 
printf ("PUNCTUATION equ 16 UB ita eee Cana) es 

printf ("CONTROL equias2 Te Bate — Viale Ouse) s 

printf ("BLANK equ 64 .) Bitio— space tab \n 

printf ("HEXDIGIT equ 128 A lait 7 = O) ib AN Jet) 8 


printf ("ALPHABETIC equ UPPERCASE + LOWERCASE\n" ) ; 
printf ("ALPHANUMERIC equ ALPHABETIC + DIGIT\n"); 


2 As the odd, occasional, passing meerkat has been know to utter! 


4.2 Character Characteristics 


25 


printf ("PRINTABLE 


" 


)e 


printf ("GRAPHIC 


printf ("\n\nchartab 
= 0; x < 256; x++) { 
printf("dce.b 0 "); 
(iscntrl (x)) 
(isupper(x)) 
(islower(x)) 
(isdigit (x)) 


for (x 


} 


if 
if 
if 
Hit 
if 
if 
if 
if 


(ispunct(x)) printf ("+ PUNCTUATION "); 


equ BLANK + PUNCTUATION + ALPHABETIC + DIGIT\n 


equ PUNCTUATION + ALPHABETIC + DIGIT\n"); 


My) 2 
? 


printf("+ CONTROL "); 
printf("+ UPPERCASE "); 
printf ("+ LOWERCASE "); 
printf("+ DIGIT "); 
(isxdigit(x)) printf("+ HEXDIGIT "); 


(isspace(x)) printf ("+ SPACE "); 
I] x == 32) printf ("+ BLANK "); 
2 (CREE Gach) = “Yoe Nin 


(x == 


printf(" 


return QO; 


isprint(x) ? x 


> ae 
cS ? 


» xX, 


Listing 4.5: C68 utility: characters_c 


The code above, compiled to characters_exe, generates a file that I can use in my assembly 
code. It does it much faster than I can, and more accurately to boot. 


Note that C68 on the QL doesn’t have the function isblank , so I’ve hard coded the only two values 
that that function applies to, tab (9) and space (32). C68 gives the following character attributes: 


UpperCase 65 through 90, ‘A’ through ‘2’; 

LowerCase 97 through 122, ‘a’ through ‘2’; 

Digit 48 through 57, ‘0’ through ‘9’; 

Hex Digit 48 through 57, 65 through 70, 97 through 102, ‘0’ through ‘9’, ‘A’ through ‘FP’, ‘a’ 
through ‘f’; 

WhiteSpace 9 through 13, 32, Tab through Carriage Return, Space; 

Blank 9 and 32, Tab and Space; 

Control 33 through 47, 58 through 64, 91 through 96, 123 through 126, 128 through 191. 

Puntuation 33 through 47, 58 through 64, 91 through 96, 123 through 126, 128 through 191; 


The top of the generated file, which I named characters_asm_in, resembles the following: 


UPPERCASE equ 1 ; Bit O =A-Z 

LOWERCASE equ 2 5 lite il = gh = w 

DIGIT equ 4 s Bini 2 = © = 9 

SPACE equ 8 5 Its 3 = SOAS el) Clic 

PUNCTUATION equ 16 eels ii (ee — eS 

CONTROL equ 32 ; Bit 5 = Various 

BLANK equ 64 ; Bit 6 = space tab 

HEXDIGIT equ 128 2 Bit 7 = 0 9a it N= Je 

ALPHABETIC equ UPPERCASE + LOWERCASE 

ALPHANUMERIC equ ALPHABETIC + DIGIT 

PRINTABLE equ BLANK + PUNCTUATION + ALPHABETIC + DIGIT 

GRAPHIC equ PUNCTUATION + ALPHABETIC + DIGIT 

chartab dc.b 0 + CONTROL ; CHR$(0) = ; 
dc.b 0 + CONTROL | CHRs (1), = : 


dc.b 0 + CONTROL 


; CHR$(2) 


1y 
18 
19 
20 
21 
22 
23 
24 
29 
26 
ai 
28 


26 Chapter 4. Lookup Tables 


dc.b 0 + CONTROL ; CHR$(3) = °.’ 
dc.b 0 + CONTROL ; CHR$(4) = °.’ 
dc.b 0 + CONTROL ; CHR$(5) = °.’ 
dc.b 0 + CONTROL ; CHR$(6) = °.’ 
dc.b 0 + CONTROL ; CHR$(7) = °.’ 
dc.b 0 + CONTROL ; CHR$(8) = °.’ 
dc.b 0 + CONTROL + SPACE + BLANK ; CHR$(9) = ”.’ 
dc.b 0 + CONTROL + SPACE ; CHR$(10) = °.’ 
dc.b 0 + CONTROL + SPACE ; CHR$(11) = ’.’ 
dc.b 0 + CONTROL + SPACE ; CHR$(12) = °.’ 
dc.b 0 + CONTROL + SPACE ; CHR$(13) = °.’ 


Listing 4.6: Extract of the generated file characters_asm_in 


Beware, however, if you view the generated file in an operating system that is not QDOSMSQ 
because some of the QL character codes represent “invalid” characters in some character sets, on 
PCs or Linux, for example. 


So, now that the table has been created, we need some assembly code to call when we want to 
check if, for example, a character code is a digit. Those character attribute functions would look 
like the following. My file is named charAttr_asm_in: 

; All these functions require a character code in DO.B and will 


return DOl= 0 it Sthe schianacter us mvalid otherwise) DOB willl be 
; a relatively random non—zero value. 


| ENTRY Resisitens: 
3 DOEB (Character "code tol be “tested 


5 JEIUL IRGRISISES 3 


: DO.B Zero — Character test failed. (Z flag set) 
: non—zero — Character test passed. 


in winl_source_characters_asm_in 
_ Given a character code in DOLBY extract. the ‘character “attributes 
; bitmap from chartab into DO.B. 


; Mask the attribute bitmap with the desired attribute mask to get 
2 Ne WalbiGltin@imn resmllt . 


+ Return the result in) DOB with Z set if the test BAILED: 
; On the stack we have D1.W. 


DIP Bere quinedsmask: 
7 DORBE— chidnracterscode 


isanything move.1 a2,—(a7) ; save the worker 
lea chartab ,a2 CG hanac ver salts bUbte Sam talbilic 
ext.w dO ; DO must be a word wide 
move.b (a2,d0.w) ,d0 ; Attributes bitmap byte 
and.b dl,d0O J Domavimibuitie se marcy? 
move.1 (a7)+,a2 ; Restore worker 
move.w (a7)+,dl ; Restore the other worker 
tst.b dO 5 2 = Pest iene 


rts 


4.2 Character Characteristics 


27 


; These just 
; common code above. 


; is done above. 


isdigit 


isalpha 


isalnum 


isupper 


islower 


isxdigit 


ispunct 


iscntrl 


isgraph 


isprint 


isspace 


isblank 


move.w dl,—(a7) 
move.b #DIGIT,d1 
bra.s isanything 


move.w dl,—(a7) 
move.b #ALPHABETIC, dl 
bra.s isanything 


move.w dl,—(a7) 
move.b #ALPHANUMERIC, d1 
bra.s isanything 


move.w dl,—(a7) 
move.b #UPPERCASE, dl 
bra.s isanything 


move.w dl,—(a7) 
move .w #LOWERCASE, dl 
bra.s isanything 


move.w dl,—(a7) 
move.b #HEXDIGIT, dl 
bra.s isanything 


move.w dl,—(a7) 
move.b #PUNCTUATION, dl 
bra.s isanything 


move.w dl,—(a7) 
move .b #CONTROL, dl 
bra.s isanything 


move.w dl,—(a7) 
move.b #GRAPHIC, d1 
bra.s isanything 


move.w dl,—(a7) 
move.b #PRINTABLE, dl 
bra.s isanything 


move.w dl,—(a7) 
move.b #SPACE,d1 
bra.s isanything 


move.w dl,—(a7) 
move.b #BLANK, dl 
bra.s isanything 


set up the mask we want in D1.W, and jump off to the 
The unstacking of Dl 


.W and return to caller 


Save the first worker 
Required attribute mask 
Never return where! 


See (Ne ie Wirorelxeie 
Required attribute mask 
Neviet eretuwnneehenes! 


Save the first worker 
Required attribute mask 
Never return here! 


Save the first worker 
Required attribute mask 
Never Tetunne heres! 


Save the first worker 
Required attribute mask 
Never return) here! 


Save the first worker 
Required attribute mask 
Never sretunne heres! 


Save the first worker 
Required attribute mask 
INievict aie Uineenicies 


See Ne ies wiorelkeir 
Required attribute mask 
Never return here! 


Save the first worker 
Required attribute mask 
Never retunny hemes! 


Save the first worker 
Required attribute mask 
Never return here! 


Save the first worker 
Required attribute mask 
Never return heres! 


Save the first worker 
Required attribute mask 
Never sretunnyhene:! 


Listing 4.7: Character attributes library - charAttr_asm_in 


How these work is pretty simple: 


NAYAN WN eK 


4.2.1 


28 


Chapter 4. Lookup Tables 


We enter with the character code to be tested in DO.B, as we will be about to trash it, we 
save D1.W on the stack prior to loading its low byte with the required attribute mask that we 
need for the current test. 

A branch is then made to the common code which saves A2.L as we will be using it. The 
character’s attribute bitmap is then extracted from the table. This bitmap is appropriate to 
the character code originally in DO.B but which we have now extended to word sized to 
index into the attribute bitmap table. 

The attribute bitmap is ANDed with the desired attribute mask and the result in DO.B will be 
zero if there are no common bits in the two masks - the test has failed, or non-zero if at least 
one pair of common bits matched. 

The stack is then tidied and we return to the caller with the Z flag set to indicate a failure, 
unusually, or unset to show that the character code in DO.B was a character which belonged 
to the attribute set we were interested in - a digit, an upper case letter etc. 


In your code, this can be used as follows: 


in charAttr_asm_in 


move.b(a2) ,d0 ; Get character code from buffer 


bsr isalnum 5 lis it a Jleiiei @ir alist? 
beq.s notAlnum S IN@®, it SS Mot 


Listing 4.8: Using the charAttr_asm_in routines 


This code is useful when writing something like a lexer (part of a compiler, assembler etc) or 
where you are processing text for some reason. It can save you having to check that the character 


in DO.B is less than or equal to ‘Z’ and greater or equal to ‘A’ or less than or equal to ‘z’ or greater 
than or equal to ‘a’ - and so on. (Yes, I know, the 68020 has the CMP2 instruction which makes 


this easier.) 


A Final Thought 


If necessary, the 256 byte table of attributes could be created, then saved as a binary file and binary 
included into your application’s code, using the appropriate command for your assembler. On 
GWASS this is the LIB or the INCBIN command. 


For homework, you could convert the character attribute functions to be SuperBASIC extensions? 
If you feel the need? Maybe? 


UTF8 is a character set much loved, perhaps, by Linux, MacOS and increasingly, Windows com- 
puters. As it happens, most of the HTML pages, as well as almost all XML files, are themselves 
in UTF8 format. What is it and how does it affect the QL? 


I spend more time editing files, at least to get a first draft, in Linux. When I copy the files up to 
my QPC session and open them in QD, a couple of things happen: 


e QD converts all my runs of 4 spaces to a tab character, even though I’ve repeatedly asked it 
not to. I’m rapidly losing patience with QD! 

e Some of the QL characters, happily typed on Linux, are shown as weird blobs in QD. The 
UK Pound sign, for example, or the Euro are blobs in QD when they were fine on Linux. 
Why? 

e Writing QL files back to, say, DOS1_, then opening them in a Linux editor shows many 
characters as the UTF8 character with Code Point U+0000, the black blob with a question 
mark in it. Oops! Don’t even try opening a QL file with the arrow characters within, you 
don’t want to go there! 


UTF8 Encoding 


UTF8 is an encoding standard for plain text. It is a multi-byte character set which simply means 
that some characters in the set, take up more than one byte when viewed “‘in the raw” (or with a 
hex dump). UTF8 has a big enough encoding method that all (I am led to believe) the characters in 
all the languages of the world, plus all their punctuations, numbers and so on, can be represented. 


UTF§8 characters can be 1, 2,3 or 4 bytes long. The UK Pound sign, for example, is two bytes 
- $C2A3, the Euro symbol is three bytes - $E282AC, while the humble digit seven remains as a 
single byte - $37. 


The rules are simple: 


5.2 


30 Chapter 5. UTF8 and the QL 


e Each character has what is known as a “code point” and is represented by the expression 
“U-+nnnn” where the “nnnn” part may be two, three or four hex pairs. Single byte characters, 
like the digits, are shows also as “U+nnnn” but the first two digits are zeros - “U+0037” for 
our digit seven. 

e ASCII characters, below 128, are represented in UTF8 by a single byte, exactly the same 
as the current ASCII byte. Handy! Not on a QL of course! Code points U+0000 through 
U+007F are represented here. 

e ASCII characters above 128 are split into three groups. 

— Code points from U+0080 through U+07FF are all two bytes long. 
— Code points from U+0800 through U+FFFF are all three bytes long. 
— Code points from U+10000 through U+10FFFF are all 4 bytes long. 


So, how do we encode an ASCII character onto one, two, three or four bytes of UTF8? Easy! 


e In ASCII, all characters with the top bit (bit 7) clear will have their UTF8 code point value, 
encoded into the lower 7 bits of a single byte. In other words, Oxxxxxxx, allowing 7 bits to 
encode the code point. 

e Two byte UTF8 characters have the layout 110xxxxx 10xxxxxx, and this allows for 11 bits 
to encode the code point within the two bytes. 

e Three byte UTF8 characters have the layout 1110xxxx 1Oxxxxxx 10xxxxxx, allowing for 
16 bits of code point information. 

e Finally, four byte UTF8 characters have the layout 11110xxx 10xxxxxx 1Oxxxxxx 1Oxxxxxx 
allowing for a massive 21 bits of code point values. 


So, how does that work for our examples, the digit seven, UK Pound and the Euro symbol? 


The digit seven is a single byte, and is simply the current ASCII value, $37, because that already 
has the top bit clear and the remaining bits holding the ASCII character, or the UTF8 code point 
as it is now known. 


The UK Pound, has code point U+00A3. This is higher than the highest single byte character, 
U+007F, but lower than the highest for two byte characters, so it is a two byte character. 


A two byte character is of the format 110xxxxx 10xxxxxx where the most significant bits of the 
code point value is encoded into the bits marked with an ’x’. As the code point is simply a 
hexadecimal number, U+00A3 is just 00000000 10100011 in binary, so those 8 bits get encoded 
onto the ’x’ bits, giving 110xxx10 10110011. As we cannot have any spare ’x’ bits left over, those 
that remain are cleared to zero, giving 11000010 10110011 which is, $C2 A3 - and that’s the 
character code for a Pound Sign in UTF8. 


Taking the Euro next, it has code point U+20AC which puts it into the three byte set of characters. 
Those are in the format 1110xxxx 10xxxxxx 10xxxxxx. Once again, we take the code point in 
binary and mask it onto the ’x’ bits, filling with leading zeros as appropriate. 


Code point U+20AC is 00100000 10101100 which is 16 bits as a three byte character allows for 
up to 16 bits, it fits nicely without any spare ’x’ bits. The result is 11100010 10000010 10101100 
or $E2 82 AC and that’s the three bytes we use for the Euro symbol. 


The QL Character Set 


As ever, nothing is straight forward in the QL world. Sir Clive has done his best to unstandardise 
things. However, I suppose he had only 256 characters to fit ASCII and a few “foreign” characters 
that might be needed in Europe. America seems to get by on only 7 bits ASCII anyway! So, 


5.2 The QL Character Set 3] 


what’s broken in the QL’s character set? 


e The UK Pound symbol is character 96 ($60) on the QL, but in ASCII it is character 163 
($A3). 

e The copyright symbol is character 127 ($7F) on the QL but is 169 ($A9) in ASCII. 

e The Euro, which came a long time after the QL, doesn’t exist in the BBQL character set, but 
under SMSQ, it is at character 181 ($B5) 

e Characters above 128 ($80) are a mess on the QL. Many are simply missing, especially 
some of the, I assume, lesser used accented characters. 


So while my Linux editor can open files created on the QL, and the QL can open (most) files 
created on the Linux side of things, it’s not completely the same. A conversion is required, one to 
go from the QL to Linux (MacOS, Windows etc) and one to come back again. 


I guess we need some assembly code then? Read on. 


This utility is what I would need to use when I’ve saved a file on the QL, or in QPC, and I need to 
transfer it down to the Linux box for some processing - say, for example, to get the finished and 
tested source code into an article like this one! 


The utility is an example of a QL program which are collectively becoming known as a “YAP”. 


The utility reads a QL created text file, where the content is any of the QL character set up to 
but not above, character 191 ($BF) which is the down arrow. Anything above that is a control 
character and is unprintable - undefined results may occur if any are present in the QL file. 


It is executed in the usual manner: 


ex raml_ql2utf8_bin, ram1l_ql_txt, raml_utf8_txt 
Listing 6.1: Executing ql2utf8 


The input file, ram1_ql_txt will be read in, and each byte converted to the appropriate UTF8 byte 
sequence, and written out to the ram1_utf8_txt file. The latter file will be used on my Linux box, 
but Windows and MacOS users can also take advantage. 


Right, enough waffle, on with the code. 


The Code 


As ever, my code starts with an introductory header and some equates. This utility is no different 
as you can see below. 


OLZWMES: 


Whines Tiler eonvernis OL test wiles t@ (WINES ior mse wm Ibi, Wee wr 


Yet Another Filter! 


34 Chapter 6. QI2utf8 Utility 


; Windows where most modern editors etc, default to UTF8. 
> EX ql2utisobin > inputliile | output file lor channel 


> 


> 


; 26/09/2019 NDunbar Created for QDOSMSQ Assembly Mailing List 


; (c) Norman Dunbar, 2019. Permission granted for unlimited use 
; or abuse, without attribution being required. Just enjoy! 


cy 


; How many channels do I want? 
numchans equ 2 ; How many channels required? 


eroivalcikees (ultitie 


sourceld equ $02 2 Oisei Gay) to inom swe il 
destlId equ $06 TVOtiset(A7) mito out pits failed 
; Other Variables 

pound equ 96 ; UK Pound sign. 

copyright equ 127 2 (@) Sism. 

grave equ 159 ; Backtick/Grave accent. 

euro equ 181 ; Euro symbol 

err_bp equ —15 

err_eof equ —10 

me equ -1 

timeout equ =ll 


Listing 6.2: QI2utf8: Introductory comments 


The main entry point for the program is next. This section of code contains the usual QDOS Job 
header and a few checks to ensure that we only get a pair of channel IDs on the stack. If the user 
decided to pass over a command string as well, it would be ignored. 


; Here besanis thie (code, 


6 DUCES Oil WMryy g 


; $06(a7) = Output file channel id. 
; $02(a7) = Source file channel id. 
; $00(a7) = How many channels? Should be $02. 


start bra.s checkStack 
dc.l $00 
dc.w $4afb 
name dc.w name_end—name—2 
dc.b > QL2UTF8’ 
name_end equ * 
version dc.w vers_end—version —2 
dc.b >Version 1.00’ 
vers_end equ * 


69 
70 
71 
72 
73 
74 
75 


Ee 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 


6.1 The Code 35 


bad_parameter 


moveq #err_bp ,d0 2 (Gmess | 

bra errorExit 8 IDE In@yeriolly 
; Check the stack on entry. We only require NUMCHAN channels — any 
; thing other than NUMCHANS will result in a BAD PARAMETER error on 
; exit from EW (but not from EX). 
checkStack 

cmpi.w #numchans,(a7) ; Two channels is a must 

bne.s bad_parameter ; Oops 


Listing 6.3: QI2utf8: Job header and initialisation 


Next up is some initialisation. In this short section of code, a couple of registers are set to values 
which will be used throughout the entire utility. 


> 


; Initialise a couple of registers that will keep their values all 
5 rong Tne resi OF ine Code, 


qi2utf8 


lea utf8 ,a2 ; Preserved throughout 
moveq #timeout ,d3  limeout. alison Pnesenved 


Listing 6.4: QI2utf8: Initialising constant registers 


And now we have the top of the main loop for the program. We start here by initialising the various 
registers to be able to read a single byte from the input channel. The ID for that file is on the stack 
at offset 2 from the current value in register A7. 


Once a byte has been read we check the error code in DO, and if it shows no errors, we can get 
on with the translation. If DO is showing an error, and it happens to be End Of File, we bale out 
of the program and return success to SuperBASIC, Other errors will return the appropriate error 
code to SuperBASIC, but that will only be seen if the utility was executed with EXEC_W or EW, 
or equivalent. 


Fy 


lhe main oo pesiantce hence Wedd mdsisime leubDyleemche Che stormsbOL mete: 


readLoop 
moveq #io_fbyte ,d0 a Eeich ones by te 
move.! sourceID(a7),a0 ; Channel to readLoop 
trap #3 ; Do input 
Ltt ¢ Il do ; OK? 
begs testBit7 a YES 
cmpi.1 #ERR_EOF, d0 ; All done? 
beq allDone ie Se 
bra errorExit ; Oops! 


Listing 6.5: QI2utf8: Top of the loop - reading bytes 


The first check is to test it bit 7 of the character just read, is set or not. It it is set then the chances 
are that it is a multi-byte character. If it is clear, then we continue processing. 


89 
90 
91 


110 
111 
112 
113 
114 
115 
116 
117 
118 
1 
120 
iA 
122 
123 


36 Chapter 6. QI2utf8 Utility 


testBit7 
btst #7,d1 2 Jett 7 Set? 
bne.s twoBytes ; Multi Byte character if so 


Listing 6.6: QI2utf8: One byte? Or More? 


Right then, at this point the top bit must be clear, so we are looking at a single byte character, 
or are we? The QL has a few little exceptions to the rule as it uses different character codes to 
standard (if there is such a thing) ASCII. 


The first exception is the UK Pound sign, which is a two byte character in UTF8. The code below 
checks and processes a Pound sign, if one is found. After writing out the UTF8 codes, it loops 
back to the start of the main loop, ready for the next character. 


; The UK Pound and copyright signs are exceptions to the "bytes 

; less than $80 are the same in UTF8 as they are in ASCII" rule as 

; Sir Clive didn’t follow ASCII 100%. Both characters are multi—byte 
2 1m) ULES 


testPound 
btst #7,d1 LoientialsmiMilitn—bytemc hanac ven 2 
bne.s twoBytes 5 Wes 
cmpi.b #pound,dl ; Got a UK Pound sign? 
bne.s testCopyright “NOE 

gotPound 
move.b #$c2,d1 ; Pound is $C2A3 in UTF8. 
bsr.s writeByte 7 Write fins byte 
move.b #$a3,d1 
bsi-s writeByte ; Write second byte 
bra.s readLoop 


Listing 6.7: QI2utf8: Handling the UK Pound 


The next exception is the copyright symbol. It too is a multi byte character in UTF8 so the code 
below checks for it and deals with it appropriately. 


> 


; Here we repeat the same check as above, in case we have the 
; copyright sign. 


testCopyright 


cmpi.b #copyright ,dl ; Got a copyright sign? 
bne.s oneByte = Nor 
gotCopyright 
move.b #$c2,d1 ; Copyright is $C2A9 in UTF8. 
bsr.s writeByte FW Delete lbtes it ebiyive 
move.b #$a9 ,dl 
bsr.s writeByte ; Write second byte 
bra.s readLoop 


Listing 6.8: QI2utf8: Handling copyright 


That’s all the QL characters that are exceptions to the “ASCII characters below code 128 are 
single byte in UTF8” rule. The remaining QL characters less than code 128 are dealt with by 


125 
126 
127 
128 
129 
130 
131 


132 
13 
134 
ie 
136 
137 
138 
139 
140 
141 
142 
143 


144 
145 
146 
147 
148 


6.1 The Code 37 


simply calling the routine to write a single byte and then heading back to the top of the main loop. 
Job done. 


7 


; All other ASCII characters , below $80, are single byte in UTF8 and 


Abe UnicmsamencodemasmineeAts Gills 

oneByte 
bsr.s writeByte ; Single byte required in UTF8 
bra.s readLoop 


Listing 6.9: QI2utf8: handling low value ASCII codes 


Speaking of writing a single byte, the following code does exactly that. It fetches the channel ID 
for the output channel from the stack. Normally, this would be at offset “destId” on from A7, but 
as this code is always called as a subroutine, there is an extra 4 bytes on the stack for the calling 
code’s return address, so that has to be considered. 


All the following snippet has to do is set up the registers to enable the trap call, IO_SBYTE, to be 
called. D3, the timeout, is already set to -1, and will be preserved on return, as will D2, which is 
being used elsewhere in the code to safely hold a value during processing. 


> 


; A small but perfectly formed subroutine to send the byte in DI to 
ACHE OUP Utec hramme lie 


; BEWARE: This is called with an extra 4 bytes on the stack! 


writeByte 


moveq #i0_sbyte ,d0 = Send vone iby te 
move.1 4+destId(a7),a0 ; Output channel id 
trap #3 

tst.l d0 ; OK? 

bne.s errorExit ; Oops! 

rts 


Listing 6.10: QI2utf8: Writing one byte of UTF8 


As mentioned above, we have processed all the QL characters that are a single byte in UTF8, 
so now we need to think about those characters with codes above 127, the majority of these are 


accented characters and as the QL doesn’t cover all the “standard” ones, there is some “furkling 
about’”” to be done. 


The QL wouldn’t be the QL we know and love if there were not a couple of exceptions to the rule 
that “ASCII characters above code 128 are always multi-byte”. The grave (no, not somewhere you 
bury people, the accent much loved by the French I believe) aka the backtick (at least on Unix, 
Linux etc) is actually a single byte character in UTF8, so that is dealt with first. 


We arrive at the following code whenever a character is read in which has the top bit, bit 7, set. 


The code begins by checking for and processing a grave character. 


; ASCII codes from $80 upwards require multiple bytes in UTF8. In the 


; case of the QL, these are mostly 2 bytes long. I could use IO_SSTRG 
; here, I know. 


5 IslOwever, OS OWE, Were Bie xeepiilOMms, Ine Grenwe Reem ((WEeliCl.< )) 


2That would be a technical term! 


149 
150 
151 
152 
133 
154 
155 
156 
iz 
158 
159 
160 
161 
162 
163 
164 


165 
166 
167 
168 
169 
170 
es 
172 
| 
174 
175 
176 
177 
178 
L79 
180 


181 
182 
183 


38 Chapter 6. QI2utf8 Utility 


; is a single byte on output, while the 4 arrow keys are three bytes. 
Lhe byteSmUOmbemsenianancncad mihOmmdsstalblicumbiecalsicn mma caine nem @le 
lS NOt usin es suhic wil Ssietesot waccentedmchanacters — sso. there 1s 

; mucking about to be done. 


twoBytes 


cmpi.b #grave ,dl ; Backtick/Grave accent? 
bne.s testEuro “No: 


> 


; We are dealing with a backtick character (aka Grave accent)? 


2 


gotGrave 
move.b #pound,dl ; Grave in = pound out! 
bsr.s writeByte ; Single byte required 
bra readLoop 2 (DO Une resi 


Listing 6.11: Ql2utf8: Handling exceptions - the Grave/backtick 


From here on in we should be dealing with all the two byte characters for UTF8, however, those 
exceptions are popping up again. The first is the Euro symbol. This is missing from the original 
128Kb QLs of old, as the Euro didn’t even exist when they were conceived, however, in SMSQ, 
they have been allocated character 181 - which, when you look at it in Pennel or similar, is a 
seriously weird character which I’ve never seen used, so I think the SMSQ authors chose well! 


In UTF8 the Euro needs three characters, $E282AC, so the following section of code does the 
necessary checking and handling of a Euro character. 


> 


; Here we repeat the same check as above, in case we have the 
See UT OMmes ican 


testEuro 


cmpi.b #euro ,dl 7 Govra Euro msien 
bne.s testArrows NOE 
gotEuro 
move.b #$e2,d1 ; Euro is $E282AC in UTF8. 
bsr.s writeByte ee Wall Ue ater sits wa biy. Ge 
move.b #$82,d1 
bsr.s writeByte ; Write second byte 
move.b #$ac,dl 
bsr.s writeByte 7 W tite) thand) biy ite 
bra.s readLoop 


Listing 6.12: QI2utf8: Handling exceptions - the Euro Currency symbol 


Finally, in our exception handling code, the 4 arrow keys. These too are three bytes long in UTF8, 
$E2869x, where the ’x’ nibble is 0, 1, 2 or 3 depending on the arrow’s direction. Just to be 
awkward, the QL’s arrow order is different to UTF8 - on the QL the ascending character codes are 
for the Left, Right, Up, Down arrows, but in UTF8 they are ordered Left, Up, Right, Down. 


The code snippet below handles the arrow keys. 


; The arrows are $BC, $BD, $BE and $BF (left, right, up, down). These 
e Mice Clirse loyi@s il WUE, SED GG Os, wines x” is ©, 2, il oir 4, 


6.1 The Code 39 


184 9 ; 

185} testArrows 

186 move.b dl1,d2 ; Copy character code 

187 subi.b #$bc ,d2 ; Anything lower = C set 

188 becs.s notArrows ; And is not an arrow 

189 subi.b #4,d2 ; Arrows = 0-3. C clear is bad 

190 bec.s notArrows ; Still not an arrow. 

191 

192) gotArrows 

193 subi.b #$bce,d1 FIDL = @ ig 3 

194 lea arrows ,a3 ; Arrow table 

195 move.b d1,d2 ; Save index into table 

196 ext.w d2 ; Need word not byte 

197 

198 move.b #$e2,d1 ; First byte 

199 bsr.s writeByte 

200 move.b #$86,dl1 ; Second byte 

201 bsr.s writeByte 

202 move.b 0O(a3,d2.w) ,dl ; Third byte 

203 bsr.s writeByte 

204 bra readLoop ; Go around again. 

Listing 6.13: Ql2utf8: Handling exceptions - the arrow characters 

The arrow key’s third byte is located in the following tiny table which has the correct third byte 
for the appropriate arrow’s code on the QL. 

206 § ; 


207}; We need this as arrows in the QL are Left, Right, Up, Down but in 
208 § ; UTF8 they are Left, Up, Right, Down. Sigh. 


209 § ; 

210} arrows 

211 dc.b $90 ,$92 ,$91,$93 ; Awkward byte order! 

Listing 6.14: QI2utf8: The arrow character table 

That is now, all the two byte exceptions catered for. The remainder of the higher ASCH characters 
are all two bytes in size. Obviously, being the QL, these are not in the same order as the originating 
ASCII codes would be, had Sir Clive done the decent thing and used a standard ASCII code page! 
Instead he chose to omit some characters and rearrange the others into a non-standard order. 
The following code simply copies the character code from D1 to D2 and then manipulates D2 to 
go from an index into the table, to an offset into the table where a pair of bytes can be found that 
represent the UTF8 code for the current character. 
As we are dealing with character codes from 128 ($80) onwards, we start by subtracting $80 from 
the character code. This gives the correct index into the table. As each entry in the table is two 
bytes, we double the index to get the correct offset, then pick up the two bytes there and send them 
on their way to the output file, before heading back to the start of the main loop. 

21249; 

2137; Now we are certain, everything is two bytes. Read them from the 

2149; table and write them out. 

2158; 


216} notArrows 


30k, fair play, there probably wasn’t a standard ASCII code page he could use back then. 


40 Chapter 6. QI2utf8 Utility 


217 move.b_ dl1,d2 5 DA = bywie JwWst reac 
218 subi.b #$80,d2 » Adjust for tablie index 
219 ext.w d2 ; Word size needed 

220 Isl.w #1,d2 ; Double D2 for Offset 
221 move.b O(a2,d2.w) ,dl Ets tem bivite 

222 bsr.s writeByte ; Send it output 

223 addq.b #1,d2 

224 move.b O(a2,d2.w) ,dl ; Second byte 

220 bsr.s writeByte Send ait Out too 

226 bra readLoop ; Go around. 


Listing 6.15: QI2utf8: Two byte characters 


The code below is the usual tidy up and bale out code. It doesn’t require much explanation as you 
will have seen it before, many times. 


2278; 

228; No errors , exit quietly back to SuperBASIC. 

22908; 

230) allDone 

24 moveq #0,d0 

232 

233 9; 

23491; We have hit an error so we copy the code to D3 then exit via a 


2351; forcible removal of this job. EXEC W/EW will display the error in 
236]; SuperBASIC, but EXEC/EX will not. 


2378; 

238 § errorExit 

239 move.1 d0,d3 ; Error code we want to return 
240 

24148; 

2428; Kill myself when an error was detected, or at EOF. 
243 0; 

244) suicide 

245 moveq #mt_frjob ,d0 ; This job will die soon 
246 moveq #me , dl 

247 trap #1 


Listing 6.16: QI2utf8: Clean up and exit handling 


Finally, the table of two byte values for the multi-byte characters. Those which have a word of 
$0000 are exceptions, dealt with elsewhere. And finally, the table only goes as far as character 191 
($BF) as everything that follows is unprintable and unlikely to ever get into a QL text file. This 
basically means that if you do manage to do this, the output will be “undefined” - as they say! 
248 ff ; 
2491; The following table contains the two byte sequences required for 


250; QL characters above $80. These are all 2 bytes in UTF8, so quite a 
2511; simple case. (Not when converting UTF8 to QL though!) 


253 § utf8 


254 dc.w $c3a4 ; a umlaut 

255 dc.w $c3a3 ; a tilde 

256 dc.w $c3a2 a cine wmitlesx 
257 dc.w $c3a9 sew acute 

258 dc.w $c3b6 ; o umlaut 

259 dc.w $c3b5 ; o tilde 


6.1 The Code 


4] 


dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 
dc. 


fee ee2 2222228282828 8228888222888 8828888888888 8828888282882 2 8 8 


$c3b8 : 
$c3bec : 
$c3a7 : 
$c3bl : 
$c3a6 p 
$c593 g 
$c3al : 
$c3a0 : 
$c3a2 : 
$c3ab p 
$c3a8 : 
$c3aa ; 
$c3af : 
$c3ad : 
$c3ac g 
$c3ae : 
$c3b3 : 
$c3b2 : 
$c3b4 ‘ 
$c3ba : 
$c3b9 ; 
$c3bb : 
$ceb2 p 
$c2a2 : 
$c2a5 : 
$0000 Z 
$c384 p 
$c383 : 
$c385 : 
$c389 p 
$c396 : 
$c395 : 
$c398 : 
$c39c ; 
$c387 : 
$c391 : 
$c386 : 
$c592 : 
$cebl : 
$ceb4 : 
$ceb8 ; 
$cebb : 
$c2b5 ; 
$cf80 : 
$cf95 : 
$c2al ‘ 
$c2bf : 
$0000 
$c2a7 : 
$c2a4 : 
$c2ab : 
$c2bb ; 
$c2ba : 
$c3b7 ; 


Listing 6.17: QI2utf8: 


o slash 

u umlaut 

Cece dalla 

i WCE 

ae ligature 
oe ligature 
acute 
grave 
ciancum@#£llex 
umlaut 
grave 
circumflex 
umlaut 
acute 
grave 
circumflex 
acute 
grave 
circumflex 
acute 
grave 
circumflex 
as in ss (German) 
Cent 

Yen 

Grave accent — single 
umlaut 
tmlidie 
Girele 
acute 
umlaut 
tilde 
slash 
umlaut 
cedilla 
illelis 

AE ligature 
OE ligature 
alpha 

delta 

theta 

lambda 

micro (mu?) 
PI 

oO pipe 

! upside down 
? upside down 
Euro 

Section mark 
Currency symbol 
<< 

== 

Degree 
Divide 


Wee fF OOO Be ee OO OM W OD 


>> 


FA (GV (SS) (@) ) |e] 3 


The UTF8 “two byte” character table 


byte 


Uisng the Q/2utf& utility, from the previous chapter, I now have the ability to edit a QL created text 
file, on my Linux laptop, and perhaps, to use it in creating a chapter of this ePeriodical. However, 
it is also possible that I might just be very used to using my Linux editor and want to do my editing 
in Linux. If so, I now need a way to convert the UTF8 text in the edited file, back to the character 
set desired by the QL - enter the Utf82q/ utility. 


This utility is yet another example of a “YAP”. 


The utility reads a text file encoded in UTF8, and converts what it finds back into QL “speak”. It 
is executed in the usual manner: 


ex raml_utf82ql2_bin, raml_utf8_txt , ram1l_ql_txt 
Listing 7.1: Executing utf82ql 


The input file, ram1_utf8_txt will be read in, and each code point converted to the appropriate QL 
single byte, and written out to the ram1_ql_txt file. The latter file will be used on my QPC setup 
on Linux - to be assembled, compiled, etc. 


On with the code. 


The Code 


As ever, my code starts with an introductory header and some equates. This utility is no different 
as you can see below. 


> UTF82QL: 


This filter converts UTF8 text files from Linux, Mac or Windows to 


Yet Another Filter! 


44 Chapter 7. Utf82q] Utility 


J toe thems MSOMchanacten set. 

>; EX utis2ql2 bin | inputlctile > ontputstillel or lchannel 

; 28/09/2019 NDunbar Created for QDOSMSQ Assembly Mailing List 
; (c) Norman Dunbar, 2019. Permission granted for unlimited use 
; or abuse, without attribution being required. Just enjoy! 
; How many channels do I want? 

numchans equ 2 ; How many channels required? 

5 SWCIS “QMIEIE - 

sourceld equ $02 2 Oisei Gay) to inom sive il 
destId equ $06 TVOtiset (A) mito olut piltes failed 
; Other Variables 

utf8Pound equ $c2a3 ; UTF8 Pound sign 

qlPound equ 96 ; QL Pound sign 

utf8Grave equ 96 ; UTF8 Grave code 

qlGrave equ 159 7 OL Grave code 

utf8Copyright equ $c2a9 ; UTF8 copyright 

qlCopyright equ 127 ; QL copyright sign 

qlEuro equ 181 ; SMSQ Euro symbol 

err_exp equ —17 

err_bp equ —15 

err_eof equ —10 

err_or equ —4 

me equ =ll 

timeout equ = 


Listing 7.2: Utf82QI: Introductory comments 


The code above has a few equates for the various exceptions to the normal rules of ASCII and/or 
UTF8, namely that the UK Pound sign and the copyright sign are both multi-byte in UTF8 but 
single byte below CHR$(128) on the QL. In addition, the grave accent (aka backtick) should be a 
two byte character in UTF8 but is actually just a single byte. I blame Sir Clive Sinclair! 


Moving on, the code proper starts with the obligatory job header, and a couple of lines to handle 
bad parameter errors. 


; Here besanis thie coder 


5 SUIMCI OM Cmiinys 


; $06(a7) = Output file channel id. 
— $02\(al = Sounce tile channel wide 
; $00(a7) = How many channels? Should be $02. 


cu: 
78 
a 
80 
81 
82 
83 
84 
85 


7.1 The Code 45 


start bra.s checkStack 
dc. 1 $00 
dc.w $4afb 
name dc.w name_end—name—2 
dc.b > UTF82QL’ 
name_end equ * 
version dc.w vers_end—version —2 
dc.b *Version 1.00’ 
vers_end equ * 


bad_parameter 
moveq #err_bp ,d0 Guess 
bra errorExit ; Die horribly 


Listing 7.3: Utf82QI: Job header 


As with normal “YAF’’s we should check to determine if we received enough open channels on 
the stack at execution time, in this case we desire two channels - one for the UTF8 text and the 
output file for the QL text. If we don’t get exactly two channels, we bale out via the bad parameter 
handler above. 


It should be said that these error returns will only show up if you execute the code with EXEC_W 
or EW as running them under EXEC or EX doesn’t let you see the errors from the job, only from 
the command itself. 


; Check the stack on entry. We only require NUMCHAN channels — any 
; thing other than NUMCHANS will result in a BAD PARAMETER error on 
; exit from EW (but not from EX). 


checkStack 


cmpi.w #numchans,(a7) ; Two channels is a must 
bne.s bad_parameter ; Oops 


Listing 7.4: Utf82QI1: Testing for two channels 


The next code snippet sets up a few registers which will hold their values throughout the execution 
of the code, so we do this initialisation once, right here, and stop worrying about them from this 
point on. Register A2 will be pointed at a table of two byte, UTF8 code points, D3 will hold the 
infinite timeout value while A4 and A5 will hold the channel IDs for the input and output files 
passed to the utility. 


> 


] Initialise sa couple of Wrepisivers that will keep thicir yvalines all 
UMKOuShe thie nest molt hiemcodes 


ql2utf8 


era utf8 ,a2 ; Preserved throughout 

moveq #timeout ,d3  limeout also bnesemved 

move.! sourcelId(a7),a4 ; Channel ID for UTF8 input file 
move.1 destId(a7),a5 ; Channel ID for QL output file 


Listing 7.5: Utf82QI: Initialising constant registers 


102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 


46 Chapter 7. Utf82q] Utility 


Now we are into the meaty stuff - the top of the main loop is next. It starts by reading a single byte 
from the UTF$8 file and if no errors occurred, skips the error checking code. 


If the input file is now exhausted, we are done, and skip to the end where we close the files and 
exit, otherwise there must have been a heinous error detected, so we bale out via “errorExit”. 


ry 


7) Lhes mane looper startse here Nead al isimele bytes check storm BOE Vetc: 


cy 


readLoop 
bsr readByte cadmonicm biyite 
beq.s testBit7 ; No errors is good. 
cmpi.1 #ERR_EOF, dO 7 Aull done? 
beq allDone a ESE 
bra errorExit wEOops! 


Listing 7.6: Utf82QI: The main loop starts 


As discussed previously, UTF8 is a multi-byte character set. Each character can be one, two, three 
or four bytes, but the code snippet below is checking for single byte characters which always have 
bit 7 cleared. If bit 7 is set, we are always dealing with multi-byte characters, so we handle those 
elsewhere. 


r) 


S Test Une Top lont Ineme, Ini We WS vero, We Aire POOGl Or MOS Sumeslle 
; byte characters, otherwise wt as potentially multi—byte- 


testBit7 


btst #7,d1 G leik Y Sey 
bne.s multiBytes ; Multi Byte character if so 


Listing 7.7: Utf82QI: Testing for one byte UTF characters 


As ever, Sir Clive has helped make life a tad difficult for us in modern times, so there are QL based 
exceptions to the rules governing conversion of ASCII to UTF8 (and vice versa of course) so here 
we Start by dealing with the first exception - the grave accent, or backtick, character. 


The grave is a single byte UTF8 character, but on the QL it is in a position that would normally 
make it a two byte character. If we found a UTF8 grave, we load D1 with the QL’s ASCII code 
and drop in to the following lines of code. 


in UES the Grave accent s( backtick ) silsiva  sancilie™ biy temichanac ven but 
TUhicu by temavalucMdoesn) stmconmes ponds tomtnatm one lem OL On Wir Sites ts 
2 $60! (916)! bit ons the OL ite ds) S9F 1(159)s sore this ss) aniot hem Sin 

; Clive induced exception! 


r 


testGrave 


cmpi.b #utf8Grave ,dl ; Got a grave! 

bne.s oneByte ; Must be a single byte if not a pound. 
gotGrave 

move.b #qlGrave ,dl ; Write a grave character 


Listing 7.8: Utf82QI: Handling exceptions - the grave/backtick character 


The grave/backtick is the only single byte exception we need to handle and the following couple 
of lines writes the character in D1 to the output file, here it is the grave/backtick, and loops back 


114 
115 
116 
117 
118 
119 
120 


121 
122 
123 
124 
125 
126 
l27 
128 
129 
130 


131 
132 
133 
134 
135 
136 
[37 
138 
139 
140 


7.1 The Code 47 


to the head end of the main loop. If the code at “writeByte” detects an error, it will never return 
here. 


2 


7 hem by temneadmiss as valudmestnedic i byteuichanacten ssOmilte nasestnie exact 
; same code in the QL’s variation of ASCII, just write it out. 


oneByte 


bsr writeByte ; Write the byte out 
bra.s readLoop ; And continue. 


Listing 7.9: Utf82QI1: Handling one byte UTF characters 


The code above will be used as a quick “write and loop” entry point for a few more options later 
on when handling two byte exceptions such as the UK Pound and the copyright symbols, as well 
as all the other non-exception single byte characters from the UTF8 file. 


That’s all the single byte processing taken care of, the next section of code starts filtering out the 
two and three byte sequences that we need. As explained previously, two byte UTF8 characters 
have the first byte’s top 3 bits set to 110 - this next snippet checks for that. 


> 


; Most of the remaining characters will be two bytes in UTF8 and one 
Dye onthe Ol e Chere anc a teow exceptions thouch:— ube Eunos ant 
; the four arrow keys are three bytes long in UTF8. 


multiBytes 


move.b d1,d2 aC Opy mc lianachermmc ode 
andi.b #%11100000,d2 7) Keep) top threes bults 
cmpi.b #%11000000,d2 ; Two bytes? 

beq.s twoBytes 2 NOES. 


Listing 7.10: Utf82QI: Testing for two byte UTF characters 


If the byte read in did have 110 in the top three bits, it’s definitely a two byte character, so we skip 
off elsewhere to handle that - and the exceptions of course! 


The next section of code looks for 1110 in the top 4 bits which always indicates a three byte 
character. We are only interested in a few of these though, the Euro and the four arrow keys. 


s \WWe Bre IMiereEsSieal il 2 JN Wiree Wyte Chieiriciors , SO We CMEC HiOSe 
S Met. INNESS Ere iGeminmiec by Wie U8) MHI OH Wie wis CMeraCcier 
5 TONG! alin Inealingy ILI IID 


testThree 


move.b d1,d2 ; Copy character code 
andi.b #%11110000,d2 ; Keep top four bits 
cmpi.b #%11100000,d2 “lbivelesbiyiteise? 

beq.s threeBytes 2 NOES. 


Listing 7.11: Utf82QI: Testing for three byte UTF characters 


As mentioned above, we don’t care about four byte character as we can’t handle those in the 
QL - we don’t have the appropriate characters, so the next section of code simply treats all other 
first byte characters as errors by exiting the utility with an “Out of range” error. Again you need 
EXEC_W to see these errors. 


141 
142 
143 
144 
145 
146 


147 
148 
149 
150 
151 
152 
153 
154 
155 
156 


157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 


170 
7 
172 
173 
174 
175 
176 


48 Chapter 7. Utf82q] Utility 


; Li we eset here, it Ss not a valid= two or three byte ichanracter =) son 16 
cis, Ciieeimiyelhy , EM Error, SO we Wale OU wrt 


moveq #err_or ,d0O 7 Out ol “ranse ernon code 
bra errorExit SANG Me xiii wiltthie Troe. 


Listing 7.12: Utf82QI: Error out on UTF8 four byte characters 


Moving on. The following code handles the processing required for all two byte UTF8 characters. 
The leading byte is already in D1 but we need the next byte from the file to determine which 
character we have. The two bytes are then merged into a word in register D2. 


r 


; At this point we should have a UTF8 two byte character but we only 
; have the first byte in Dl. We need the second byte also, so read it 
-vandecheek that at ls sinideeds valid: 


twoBytes 


move.b dl1,d2 ; Save the leading byte 
bsr readByte ; Read the second byte 
Isl.w #8 ,d2 ; Shift first byte upwards 
or.b dl ,d2 ; And add the new byte 


Listing 7.13: Utf82QI: Handling UTF8 two byte characters 


It’s exception time again. There are rogue characters which are two bytes in UTF8 but should be 
single bytes if Sir Clive had used correct ASCII! The first exception to handle is the UK Pound 
sign. It is always $C2A3 in UTF8 which corresponds to CHR$(96) on the QL. 


; Exception checking. UTF8 codes $C2A3 for the UK Pound and $C2A9 for 
; copyright, are not in the table. They are QL codes $60 (96) and $7F 
| (127) ancl ere SxeepiiOMs tO ine Mle Whe a OL Coals less Wem 2s 

; always has a one byte code in UTF8 — they are both two bytes. 


testPound 
cmpi.w #utf8Pound ,d2 ; Got a UK Pound? 
bne.s testCopyright ; No 
gotPound 
move.b #qlPound ,dl ; QL Pound code 
bra.s oneByte ; Write it out & loop around 


Listing 7.14: Utf82QI: Handling exceptions - the UK Pound symbol 


If it wasn’t a UTF8 UK Pound that we just read, was it a copyright symbol? This has UTF8 code 
$C2A9 and QL CHR$(127), so the next code section handles that. 


testCopyright 
cmpi.w #utf8Copyright ,d2 ; Got a copyright? 
bnews doScan ; No 
gotCopyright 
move.b #qlCopyright ,dl 
bra.s oneByte ; Write it out & loop around 


Listing 7.15: Utf82QI: Handling exceptions - the copyright symbol 


Lez 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 


189 
190 
191 


192 
193 
194 
19s 
196 
oy 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 


7.1 The Code 49 


Those are all the exceptions in the two byte characters, so the rest should be simple. The word in 
D2 is checked and converted to a QL character code by the subroutine at “scanTable” which will 
be discussed later. If the character is a valid two byte UTF8 character, it will be written out and 
we then return to the top of the main loop. 


ry 


; Ok, exceptions processed, do the remaining two byte characters. 


doScan 


bsr.s scanTable RIES lS well Warts 

cmpi.w #—1,d0 ; Not found? 

bmi.s invalidUTF8 ; No, not found. 
validUTF8 

move.b d0,dl a Getmtnicuc namactenmucode 

bsr.s writeByte 2 NGAI Ie OM 

bra readLoop ; And continue. 


Listing 7.16: Utf82QI: Two byte UTF8 character handling 


On the other hand, if the character is an invalid one, er exit the program with an “Error in expres- 
sion” error code, assuming EXEC_W is waiting to retrieve the error of course. 


invalidUTF8 
moveq #err_exp ,d0 ; Error in expression 
bra errorExit ee Balicmoulte 


Listing 7.17: Utf82QI: Invalid UTF8 character detected 


We are now done processing the two byte UTF8 characters and ready to move on to the three byte 
ones. Of those, we only care about the Euro which is $E282AC and the four arrow keys which are 
$E28690 through to $E28693. 


The next section of code saves the leading byte from D1 into D2 then reads the second byte into 
D1. If the seconds byte is suitable for the Euro or arrow keys, we will continue, otherwise we bale 
out, as above, with invalid UTF8 error messages. 


; At this point we should have a UTF8 three byte character but we 

5 Omby Ieee ine irs baie am IDI, We meeal wis Secomcl byit AIS@®, So 

7 cead it sand check thar iheiss indeed valid Then set the third ™ by tex 
; All our three byte characters should have $E2 in the first byte. 


the Hurom is e252 NG. 
; The Arrows are $E2869x where ’x’ is 0,1,2 or 3. 


threeBytes 


cmpi.b #$e2,d1 ; Valid three byte? 
bne.s invalidUTF8 ; Looks unlikely. 
move.b_ dl1,d2 2 DENS iw ToS Nye 
bsr.s readByte 1 Geb thie second ibiyte 
cmpi.b #$82,d1 ; Euro second byte? 
beqms three Valid a WOES 

cmpi.b #$86,d1 ; Arrow second byte? 
De. & invalidUTF8 B weallhy , nie), irae Opie, 


Listing 7.18: Utf82QIl: Three byte UTF8 character handling 


211 
212 
213 
214 
215 
216 


217 
218 
219 
220 
221 
ae 
223 
224 
225 


50 Chapter 7. Utf82q] Utility 


This next section of the code merges the second byte into D2 giving us the first word of the three 
character UTF8 code, then reads the third and final byte into D1. If the leading word is not $E282, 
we are possibly handling the arrow keys, so we skip off to handle those elsewhere. 


three Valid 


Isl.w #8 ,d2 ; Shift first byte upwards 


or.b dl ,d2 ; And add the new byte 
bist s readByte ; Get the third byte 
cmpi.w #$e282 ,d2 ; Euro possibly? 

ibiierss threeArrows ; No, try arrows 


Listing 7.19: Utf82QI: Fetching the third byte 


We should be handling the Euro here then, so the next snippet of the code checks that the third 
byte is indeed a valid Euro third byte and bales out if not. If it was valid, we set up D1 with the 
SMSQ Euro code, CHR$(181) and skip back to the top of the main loop via the code at “oneByte” 
which writes the character in D1 to the QL text file. 


; We have read $e282 so if we get $ac next, we have the euro. If not 
; it’s an error in the UTF8 characters that the QL understands. 


threeEuro 


cmpi.b #$ac ,d1 ; Need this for the Euro 
bne.s invalidUTF8 7 NO e HROm Out mapa. 
move.b #qlEuro ,dl ; QL Euro code 

bra oneByte ; Write it out and continue. 


Listing 7.20: Utf82QI: Handling the Euro Currency symbol 


The remaining three character UTF8 code must be one of the 4 arrow keys. The first two bytes 
will be $E286 and the third byte will be one of $90, $91, $92 or $93 - anything else is an invalid 
UTF8 character as far as the QI is concerned. 


The next code section checks the word in D2 to be sure it’s a potential arrow key. If not, it’s invalid 
and we exit with an error. If the code was potentially an arrow character, subtracting $90 will give 
us a value between zero of 3 for a valid arrow - so it went negative, we didn’t have an arrow and 
we bale out, again, with an error. 


So far so good, if the value left in D1 is bigger than 3, it cannot be an arrow so once again, we 
leave the utility with an error code indicating invalid UTF8. 


Finally, we must have a valid arrow. By adding on $BC to the current value in D1 we get the 
appropriate QL arrow character code in D1 and we send that to the output QL file by utilising the 
code at “oneByte” to write it and head back to the top of the loop. 


; The QL arrows are $BC, $BD, $BE and $BF (left, right, up, down). 
2 Ae WANS, SIBOMO whine Oo 18 ©, 2, i Or 3 fe miro sin Hoe 
; order of the QL arrow codes. 


threeArrows 


cmpi.w #$e286,d2 ; Got a potential arrow code? 
bne.s invalidUTF8 o JeeeniGl MOl., Cirie@ie OU, 

subi.b #$90,d1 ; DI is now 0-3 for valid arrows 
bmi.s invalidUTF8 ; Oops, it went negative 

cmpi.b #3,d1 ; Highest arrow code 


bhi.s invalidUTF8 ; Oops, invalid arrow code. 


7.1 The Code 51 


238 addi.b #$bc,dl ; Convert to QL arrow code. 
bra oneByte ; Write it out and continue. 


Listing 7.21: Utf82QI: Handling the arrow characters 


The rest of the code are subroutines you have seen before*. The first writes a single byte to the 
output file while the second reads a single byte from the UTF8 input file. These routines never 
return if QDOSMSQ returns an error code, other than EOF. 


240 § ; 
24191; A small but perfectly formed subroutine to send the byte in DI to 
2429; the output QL file. 

2437; On Entry , AO = input channel ID and A3 = output channel ID. 

244 EE Onwexdt DOs Oh Zasicite 


245 EE Onmenronmessmlevetmne Tulane: 

246 8 ; 

247) writeByte 

248 move.1 a5,a0 = Get thes conmect channel 1D 
249 moveq #i0_sbyte ,d0 > send Jone byte 

250 trap #3 

231 tst.l do ; OK? 

252 bne.s errorExit ; Oops! 

255 rts 

254 

255 § ; 

256}; Another perfectly formed subroutine to read one byte into Dl 


257; from the input UTF8 file. 
258; On Entry , AO = output channel ID and A3 = input channel ID. 


259 Ee Onmexdt enone codes sim DOl Asset mii enomenton mands Dll, Be=—chanracten 
2608; just read. 

2618 ; 

262 |} readByte 

263 move.l1 a4,a0 we Getmthe cotmnect ce wanmel eID) 

264 moveq #io_fbyte ,d0 ket eh ones byte 

265 trap #3 Doman pit 

266 UM. I d0 ; OK? 

267 rts 


Listing 7.22: Utf82QI: Writing and reading bytes 


Finally a new section of code which is used to scan the table of two byte UTF8 characters. In the 
following routine, register DO is being used as the offset into the table and will obviously increase 
by two each time we fail to find the UTF8 word we are searching for. If we reach the end of the 
table, indicated by a word of zero, we have a problem and we will exit via “scanDone”. If the 
routine exits through “scanFound” then we have found our character. 
268 I; 
269 ; Scan the UTF8 table looking for the word in D2. If found, we have 
270; the table offset in DO and that is then halved to get the index which 


2718; is still $80 below the correct character code — we add to convert. 
2721; Returns with DO = the character code, or $FFFF to show the end was 
2737; reached and we appear to have an invalid two byte character. A2 
274; holds the table address. D7 is a working register. 

27548; 


276), scanTable 


You will have seen before if you read the code in the previous chapter that is! 


277 
278 
219 
280 
281 
282 
283 
284 
285 


286 
287 
288 
289 
290 
291 
292 
293 
294 


295 
296 
297 
298 
299) 
300 


301 
302 
303 
304 
305 
306 
307 
308 
309 
310 
a1) 
312 
o13 


52 Chapter 7. Utf82q] Utility 


moveq #0,d0 ; Current offset into UTF8 table. 
scanLoop 

move.w O(a2,d0.w) ,d7 hetchecurnentmta bile senthy, 

beq.s scanDone o WES, ZEFO = WOl Touinel 

cmp.w d2,d7 “Found eat yet 

beq.s scanFound eS 

addq.w #2,d0 < Now next somit set 

bra.s scanLoop ; Keep looking 


Listing 7.23: Utf82QI: Scanning for UTF8 words 


If we get to the next snippet of code, we have found the word we were searching for in the table. 
DO is still the offset into the table, so if we divide by two, we get the index into the table instead. 
As the first character in the table is CHR$(128) (aka $80) adding that value to the index found 
gives the correct character code for the QL and we return to the calling code with DO holding the 
QL character to be written out. 


; The offset in DO is where we found the UTF8 word we wanted. Halve 
2 it (© et ine anicle< amio tne faliile , wnemn ace! S80 iO Be tne CoOrmreci 
7 code tom uthe schianac ter son uhie ile 


r) 


scanFound 
lsr.w #1,d0 ee Comvyent ao iis e tt OMeimnidiex 
add.w #$80 ,d0 © (Comer (@ Character code 
rts 


Listing 7.24: Utf82QI: UTF8 character found 


We didn’t find the required word in the table, so we return with DO holing -1 which is not a valid 
character code. 


> 


; UTF8 word not found, panic! 


scanDone 
moveq #—1,d0 ; Not found 
rts 


Listing 7.25: Utf82QI: Missing UTF8 word 


The following code is the usual tidy up and handle errors, or otherwise code, much loved by me 
and my “YAF’s! 


r) 


; No errors, exit quietly back to SuperBASIC. 


allDone 
moveq #0,d0 


; We have hit an error so we copy the code to D3 then exit via a 
; forcible removal of this job. EXEC_W/EW will display the error in 
; SuperBASIC, but EXEC/EX will not. 


errorExit 
move.1 d0,d3 ~ Error code we want to Tfeturn 


314 
315 
316 
317 
318 
S19 
320 
yal 


aad 
323 
324 
325: 
326 
S27 
328 
a2o 
330 
S51 
3352 
332 
334 
339 
336 
37 
338 
339 
340 
341 
342 
343 
344 
345 
346 
347 
348 
349 
350 
yea 
352 
B55 
354 
355 
356 
S57 
358 
So? 


7.1 The Code 53 


> 


; Kill myself when an error was detected , or at EOF. 


> 


suicide 
moveq #mt_frjob ,d0 ; This job will die soon 
moveq #me, dl 
trap #1 


Listing 7.26: Utf82QI: Clean up and exit handling 


And finally for this utility, the table of values for valid UTF8 two byte characters between 128 and 
187 ($80 to $BB) which are the only ones the QL will be able to cope with. Some values are set 
to $FFFF which simply indicates that this QL character is an exception handles in the code and 
the appropriate entry in the table will never be searched for. Those are the Grave/backtick and the 
Euro characters. 


A word of zero indicates the end of the table. 


; The following table contains the two byte sequences required for 

; QL characters from character $80 onwards. Those flagged as $FFFF 

5 are Oxwcepiiomns , dealt walldd iim Woe Code, Where Hie M® SCMirleS iter 

; the arrow keys as they would simply be zero words at the end of the 
2 thevloyile - 


utf8 


dc .w $c3a4 ; a umlaut 
dc.w $c3a3 ; a tilde 
dc.w $c3a2 ; a circumflex 
dc.w $c3a9 ; e acute 
dc.w $c3b6 ; o umlaut 
dc.w $c3b5 ; o tilde 
dc.w $c3b8 “oO; es laisih 
dc.w $c3be ; u umlaut 

dc .w $c3a7 ac eed liiia 
dc.w $c3b1 ; n tilde 
dc.w $c3a6 2 fe JVseMTe 
dc.w $c593 A Oe Ibusenimnice 
dc.w $c3al ; a acute 
dc.w $c3a0 ; a grave 
dc.w $c3a2 ; a circumflex 
dc.w $c3ab ; e umlaut 
dc.w $c3a8 vue) grave 
dc.w $c3aa ; e circumflex 
dc .w $c3af ; i umlaut 
dc.w $c3ad ; i acute 
dc.w $c3ac len ciavic 
dc.w $c3ae 7 circumitlex 
dc.w $c3b3 ; oO acute 
dc.w $c3b2 ; O grave 

dc.w $c3b4 ; oO circumflex 
dc.w $c3ba -) UPealcuite 

dc .w $c3b9 5 WH irene 
dc.w $c3bb ; u circumflex 
dc.w $ceb2 ; B as in ss (German) 
dc.w $c2a2 ; Cent 


54 Chapter 7. Utf82q] Utility 


360 dc.w $c2a5 ; Yen 

361 dc.w $ffff Gta viemaccenite= ms niece mby le 
362 dc.w $c384 ; A umlaut 

363 dc.w $c383 see ta lide 

364 dc.w $c385 oN Cie@lle 

365 dc.w $c389 7 Ee acuite 

366 dc .w $c396 ; O umlaut 

367 dc.w $c395 s © iil@de 

368 dc.w $c398 » © glasin 

369 dc.w $c39c ; U umlaut 

370 dc.w $c387 2 (C @ealiiliihy 

371 dc.w $c391 2 IN tulele 

372 dc .w $c386 ; AE ligature 
aa3 dc.w $c592 ; OE ligature 
374 dc.w $cebl ; alpha 

375 dc.w $ceb4 ; delta 

376 dc.w $ceb8 thie tal 

B77 dc.w $cebb ; lambda 

378 dc.w $c2b5 ; micro (mu?) 
379 dc.w $cf80 a PII 

380 dc.w $cf95 ; O pipe 

381 dc .w $c2al ; ! upside down 
382 dc.w $c2bf ; ? upside down 
383 dc.w $ffff jure 

384 dc.w $c2a7 ; Section mark 
385 dc.w $c2a4 ; Currency symbol 
386 dc.w $c2ab ac 

387 dc .w $c2bb poss 

388 dc.w $c2ba De snee 

389 dc.w $c3b7 ; Divide 

390 

391 dc.w $0000 ; End of table 


Listing 7.27: Utf82QI: The UTF8 “two byte” character table 


The front cover image on this ePeriodical is taken from the book Kunstformen der Natur by Ger- 
man biologist Ernst Haeckel. The book was published between 1899 and 1904. The image used 
is of various Polycystines which are a specific kind of micro-fossil. 


I have also cropped the image for use on each chapter heading page. 


You can read about Polycystines on and there is a brief overview of the above book, 
also on , which shows a number of other images taken from the book. (Some of which 
I considered before choosing the current one!) 


Polycystines have absolutely nothing to do with the QL or computing in general - in fact, I suspect 
they died out before electricity was invented - but I liked the image, and decided that it would 
make a good cover for the book and a decent enough chapter heading image too. 


Not that Iam suggesting, in any way whatsoever, that we QL fans are ancient. 


